24 research outputs found

    JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

    Full text link
    We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and benchmark four leading GEC systems on this corpus, identifying specific areas in which they do well and how they can improve. JFLEG fulfills the need for a new gold standard to properly assess the current state of GEC.Comment: To appear in EACL 2017 (short papers

    Monolingual Sentence Rewriting as Machine Translation: Generation and Evaluation

    Get PDF
    In this thesis, we investigate approaches to paraphrasing entire sentences within the constraints of a given task, which we call monolingual sentence rewriting. We introduce a unified framework for monolingual sentence rewriting, and apply it to three representative tasks: sentence compression, text simplification, and grammatical error correction. We also perform a detailed analysis of the evaluation methodologies for each task, identify bias in common evaluation techniques, and propose more reliable practices. Monolingual rewriting can be thought of as translating between two types of English (such as from complex to simple), and therefore our approach is inspired by statistical machine translation. In machine translation, a large quantity of parallel data is necessary to model the transformations from input to output text. Parallel bilingual data naturally occurs between common language pairs (such as English and French), but for monolingual sentence rewriting, there is little existing parallel data and annotation is costly. We modify the statistical machine translation pipeline to harness monolingual resources and insights into task constraints in order to drastically diminish the amount of annotated data necessary to train a robust system. Our method generates more meaning-preserving and grammatical sentences than earlier approaches and requires less task-specific data. Once candidate sentences are generated, it is crucial to have reliable evaluation methods. Sentential paraphrases must fulfill a variety of requirements: preserve the meaning of the original sentence, be grammatical, and meet any stylistic or task-specific constraints. We analyze common evaluation practices and propose better methods that more accurately measure the quality of output. Often overlooked, robust automatic evaluation methodology is necessary for improving systems, and this work presents new metrics and outlines important considerations for reliably measuring the quality of the generated text

    dolls/puppets as miniatures - more than small

    Get PDF
    Weitere Hrsg.: Jana Mikota, Philipp SchmerheimDer Themenschwerpunkt der zweiten Ausgabe von de:do lautet: Puppen als Miniaturen – mehr als klein. Puppen und ihre Kontexte beanspruchen hier, ’mehr’ als nur verkleinerte Varianten oder Repliken menschlicher Lebenswelten zu sein. Nicht von ungefähr gelten sie als ein ’Fundort der Größe’ (Bachelard). Als ’kleine Formate’ generieren sie Bilder und Narrative der eigenen Art, die in Funktion und Wirkung offen sind: so bewegen sie sich zwischen Abbildung, Verdichtung und Transformation von Realität, sind Ausdruck von Sehnsüchten und/oder Kontrollbedürfnissen ihrer ErschafferInnen, lösen Bezauberung, Verwunderung oder Befremden aus und ermöglichen ganzheitliche Weltzugänge und Erkenntnis über innere Zusammenhänge. Einmal mehr erweisen sich Puppen als Miniaturen und im Kontext miniaturisierter Welten als hybride Objekte, aufgeladen mit vielerlei Symbolik und Bedeutungsüberschuss. Die Zusammenschau der höchst unterschiedlichen Beiträge im vorliegenden Heft vermittelt eine Ahnung von möglichen Spannungsverhältnissen – zwischen ’klein’ und ’groß’, ’Sichtbarem’ und ’Verstecktem’, ’Realität’ und ’Fiktion’, ’Mimesis’ und ’Poetik’. Das heterogene Themenspektrum unterstreicht die subtile Bedeutung der Puppe als einem besonderen Markenzeichen der ’kleinen Form’ in vielerlei Disziplinen. Die Beiträge stammen aus so unterschiedlichen Fächern bzw. interdisziplinär offenen Fachkulturen wie Archäologie, Anthropologie, Volkskunde, Kinder- und Jugendliteratur, Kunstgeschichte, Spielzeugkunde, Animationsfilm, Bildende Kunst, Mode-Design, Forensik. Ein Interview mit einer jungen Künstlerin, Miszellen und Rezensionen ergänzen die Themenvielfalt.The focus topic of the second edition of the journal denkste: puppe / just a bit of: doll (de:do), a multidisciplinary, peer reviewed online journal for human-doll discourses is: dolls/puppets as miniatures - more than small. Dolls/puppets and their contexts claim to be ’more’ than just miniaturized variants or replicas of human worlds. Thus, it is not by chance that they are regarded as a ’place to find greatness’ (Bachelard). As ’small formats’, they generate images and narratives of their own kind which are open in function and effect: they oscillate between representation, condensation and transformation of reality, expressing longings and/or control needs of their creators and triggering enchantment, amazement or alienation while enabling a holistic access to the world and insight into inner contexts. Arguing in this line, dolls/puppets prove to be miniatures and – in the context of miniaturized worlds –hybrid objects, charged with all sorts of symbolism and excess of meaning. The synopsis of the highly diverse contributions in this issue gives us an idea of possible tensions – between ’small’ and ’large’, ’visible’ and ’hidden’, ’reality’ and ’fiction’, ’mimesis’ and ’poetics’. The heterogeneous range of topics underlines the subtle significance of the doll/puppet as a special trademark of the ’small form’ in many disciplines. The contributions come from subjects as diverse as diverse as archeology, anthropology, folklore, children’s and youth literature, art history, toy studies, animated film, fine arts, fashion design, forensics. An interview with a young artist, miscellaneous aspects as well as reviews complete the variety of topics

    Data-driven sentence simplification: Survey and benchmark

    Get PDF
    Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand. In order to do so, several rewriting transformations can be performed such as replacement, reordering, and splitting. Executing these transformations while keeping sentences grammatical, preserving their main idea, and generating simpler output, is a challenging and still far from solved problem. In this article, we survey research on SS, focusing on approaches that attempt to learn how to simplify using corpora of aligned original-simplified sentence pairs in English, which is the dominant paradigm nowadays. We also include a benchmark of different approaches on common datasets so as to compare them and highlight their strengths and limitations. We expect that this survey will serve as a starting point for researchers interested in the task and help spark new ideas for future developments
    corecore